Back

PLOS Digital Health

Public Library of Science (PLoS)

Preprints posted in the last 30 days, ranked by how well they match PLOS Digital Health's content profile, based on 91 papers previously published here. The average preprint has a 0.11% match score for this journal, so anything above that is already an above-average fit.

1
Who is leading medical AI? A systematic review and scientometric analysis of chest x-ray research

Vasquez-Venegas, C.; Chewcharat, A.; Kimera, R.; Kurtzman, N.; Leite, M.; Woite, N. L.; Muppidi, I. J.; Muppidi, R. J.; Liu, X.; Ong, E. P.; Pal, R.; Myers, C.; Salzman, S.; Patscheider, J. S.; John, T. R.; Rogers, M.; Samuel, M.; Santana-Guerrero, J. L.; Yaacob, S.; Gameiro, R. R.; Celi, L. A.

2026-04-07 health informatics 10.64898/2026.04.02.26349884 medRxiv
Top 0.1%
51.7%
Show abstract

Computer vision models for chest X-ray interpretation hold significant promise for global healthcare, but their clinical value depends on equitable development across diverse populations. We conducted a scientometric analysis to examine authorship patterns, geographic distribution, and dataset origins to assess potential disparities that could affect clinical applicability. We systematically reviewed literature on computer vision applications for chest X-rays published between 2017-2025 across multiple databases, including PubMed, Embase and SciELO databases. Using Dimensions API and manual extraction, we analyzed 928 eligible studies, examining first and senior author affiliations, institutional contributions, dataset provenance, and collaboration patterns across different income classifications based on World Bank categories. High-income countries dominated research leadership, representing 55.6% of first authors and 59.7% of senior authors; no first authors were affiliated with low-income countries. China (16.93%) and the United States (16.72%) led in first authorship positions. Most datasets (73.6%) originated from high-income settings, with the United States being the largest contributor (40.45%). Private datasets were most frequently used (20.52%). Cross-income collaborations were rare, with only 3.9% of publications involving partnerships between high-income and lower-middle-income countries. Findings reveal substantial disparities in who shapes computer vision research on chest X-rays and which populations are represented in training data. These imbalances risk developing AI systems that perform inconsistently across diverse healthcare settings, potentially exacerbating healthcare inequities. Addressing these disparities requires coordinated efforts to develop globally representative datasets, establish equitable international collaborations, and implement policies that promote inclusive research practices.

2
CardioAI: An Explainable Machine Learning System for Cardiovascular Risk Prediction and Patient Retention in Nigerian Healthcare Settings

Gboh-Igbara, D. C.

2026-03-31 rehabilitation medicine and physical therapy 10.64898/2026.03.29.26349642 medRxiv
Top 0.1%
51.5%
Show abstract

Abstract Background: Cardiovascular disease is the leading cause of mortality in Nigeria and across sub-Saharan Africa, with rising incidence attributable to urbanisation, sedentary lifestyles, and limited access to early detection tools. Concurrently, patient dropout from rehabilitation programs remains a critical operational challenge for Nigerian clinics, with many patients failing to return after their initial consultation. Methods: We developed CardioAI, an Explainable Artificial Intelligence system comprising two predictive modules. The cardiovascular risk module trained four machine learning models - Logistic Regression, Random Forest, Gradient Boosting (XGBoost), and a Multilayer Perceptron - on a combined UCI Heart Disease dataset of 1,025 patient records. A novel Lifestyle Risk Index was engineered from five modifiable clinical markers. SHAP (SHapley Additive exPlanations) was applied for per-prediction feature attribution. The patient retention module trained three classifiers on a synthetic dataset of 800 records, modelling 10 operational and behavioural dropout factors. An NLP and OCR pipeline using Tesseract v5.5 and spaCy was implemented for clinical document processing. Results: The cardiovascular risk module achieved an AUC-ROC of 0.999 (XGBoost), 0.998 (Random Forest), 0.994 (MLP), and 0.927 (Logistic Regression) on the held-out test set. Cross-validated AUC with constrained tree depth was 0.97, confirming generalisation. SHAP analysis identified the Lifestyle Risk Index, ST depression, resting blood pressure, exercise-induced angina, and cholesterol as the five most influential predictors. The retention module achieved AUC-ROC of 0.66 (Logistic Regression), demonstrating the difficulty of dropout prediction with synthetic data. Conclusions: CardioAI demonstrates that explainable machine learning can provide clinically actionable cardiovascular risk assessment and patient retention intelligence in a low-resource Nigerian healthcare context. The system is freely deployable, open-source, and designed for pilot validation in teaching hospitals across Lagos and Port Harcourt. Keywords: cardiovascular risk prediction, machine learning, explainable AI, SHAP, patient retention, clinical decision support, Nigeria, sub-Saharan Africa, XGBoost, random forest, digital health

3
Co-creating data science solutions for maternal and child health decision-making in tribal primary health centres: an action research using the Three Co's Framework

Mitra, A.; Jayaraman, G.; Ondopu, B.; Malisetty, S. K.; Niranjan, R.; Shaik, S.; Soman, B.; Gaitonde, R.; Bhatnagar, T.; Niehaus, E.; K.S, S.; Roy, A.

2026-03-31 public and global health 10.64898/2026.03.29.26349643 medRxiv
Top 0.1%
25.9%
Show abstract

Background: Digital health tools are increasingly promoted for strengthening health information systems in low- and middle-income countries, yet routine maternal and child health (MCH) data in tribal primary health centres (PHCs) in India remains underutilised for local decision-making. Top-down digital tools often fail in low-resource settings because they are designed without meaningful input from end-users. Co-creation approaches for digital health in tribal and indigenous settings are largely unexplored. Methods: We conducted an action research study in three tribal PHCs under the Integrated Tribal Development Agency (ITDA), Rampachodavaram, Andhra Pradesh, India. We applied the Three Co's Framework (Co-Define, Co-Design, Co-Refine) to co-create data science solutions for MCH decision-making with five medical officers, 24 auxiliary nurse midwives, and 36 accredited social health activists across two action research cycles (August 2023 to August 2024). Co-creation involved collaborative indicator definition, data modelling, data quality validation, health facility catchment area construction, spatial analysis, and interactive dashboard development. Keller's Data Science Framework was employed using R to structure the analytical pipeline, and Data.org's Data Maturity Assessment (DMA) was used to assess organisational data maturity pre- and post-intervention. Findings: During Co-Define, co-creators identified a fundamental mismatch between system outputs (aggregate statistics for upward reporting) and their operational need for individual-level, geographically disaggregated, prospective information. Co-Design produced five interconnected data science solutions: (1) 42 co-defined MCH indicators grounded in clinical workflows; (2) a data model linking individuals, health services, providers, and facilities; (3) a data quality framework using the pointblank R package; (4) health facility catchment area boundaries constructed from scratch using medical officers' local knowledge, enabling spatial analysis that revealed significant clustering of ANC coverage and anaemia prevalence; and (5) an R Shiny dashboard integrating these solutions into an offline-capable interface with lifecycle-organised views and village-level navigation. The DMA showed moderate improvement in organisational data maturity from 5.04 to 5.75 out of 10, with the largest gain in Analysis (+1.90). Co-Refine continued beyond the formal study period, with two transferred medical officers maintaining analytical engagement from new postings. Interpretation: The Three Co's Framework, combined with a data science approach, provided a structured yet flexible method for co-creating locally relevant data science solutions in a tribal setting. The framework's explicit separation of problem definition from solution design was particularly valuable in a context where "the problem" is typically defined externally. Co-creation in tribal digital health settings is feasible and produces solutions that address locally articulated needs.

4
Explainable machine learning for revisiting reported Irritable Bowel Syndrome correlates in a student cohort

Ramirez-Lopez, L.; Kang, P.

2026-04-15 gastroenterology 10.64898/2026.04.13.26350820 medRxiv
Top 0.1%
23.4%
Show abstract

Irritable Bowel Syndrome (IBS) affects a substantial proportion of university students, yet its factors remain incompletely characterised in South Asian populations. We reanalysed a publicly available dataset of 550 Bangladeshi students from Hasan et al. (2025), conducting a data audit that identified implausible records, including males reporting menstrual symptoms, and reduced the analytic sample to 506 observations. Using Explainable Boosting Machines (EBMs), which capture non-linear effects and pairwise interactions without sacrificing interpretability, we found that psychological distress, elevated BMI and academic dissatisfaction were the strongest predictors of IBS (mean AUC = 0.852 across 100 stratified train-test splits). Critically, several findings diverged from the original logistic regression analysis. Physical activity showed a non-linear risk pattern only at high intensity, the association with gender was substantially weaker when we accounted for metabolic and psychological factors as well and malnourishment does not have a strong an impact as in the original study. These divergences likely arise because the machine-learning model captures non-linear effects and interactions that were not represented in the original regression specification. Our findings underscore the value of reanalysing existing datasets with methods suited to capturing complexity and highlight data quality verification as a necessary step in the secondary analysis.

5
Towards Integrated Digital Health Systems for Nutrition and Food Security in Uganda: A Cross-Sectional Survey

Samnani, A. A.; Kimbugwe, N.; Nduhuura, E.; Katarahweire, M.; Kanagwa, B.; Crowley, K.; Tierney, A.

2026-04-06 health systems and quality improvement 10.64898/2026.04.05.26350208 medRxiv
Top 0.1%
23.0%
Show abstract

Despite robust policy frameworks, Ugandas digital health landscape is characterised by fragmentation--often termed "Pilotitis"--where stand-alone applications impede the integrated delivery of health, nutrition, and food security services. As part of the IGNITE project, this study mapped existing digital health systems (DHSs), identified systemic gaps, and explored opportunities and resource requirements for sustainable integration of existing Health, Nutrition and Food security data systems. The IGNITE project adopted a mixed-methods design; however, this paper reports findings from the first phase--a national cross-sectional survey conducted in Uganda. The survey mapped digital health, nutrition, and food security systems, identifying gaps, resource needs, and potential actions. Stakeholders from government, NGOs, academia, UN agencies, and frontline health workers were included using purposive and snowball sampling. Data were collected online and through field support. Of 134 respondents, 110 with [≥]70% survey completion was included in the analysis. While 93% of respondents utilise digital tools (predominantly DHIS2 and mobile apps), only 20% reported full automated integration with national platforms. Critical barriers to interoperability included a lack of technical expertise (90%), insufficient DHIS2 training (82%), different data formats (77%), and infrastructure constraints (75%). Respondents identified workforce development (56%) and DHIS2 use and adoption (29%) as primary opportunities. Immediate priorities include staff training and provision of mobile hardware, while long-term strategies focus on standardised data formats (78%) and formalised governance frameworks for Integrated platforms (64%) and automated data exchange (56%). Uganda possesses a vibrant but disconnected digital ecosystem. Transitioning from isolated "data islands" to a cohesive system requires addressing the massive technical capacity gap and establishing mandated interoperability guidelines. The findings provide a data-driven roadmap for the Ministry of Health and partners to optimise digital health adoption, ensuring that nutrition and food security interventions are supported by a unified, evidence-informed digital architecture

6
Recovering Clinical Detail in AI-Generated Responses for Low Back Pain Through Prompt Design

Basharat, A.; Hamza, O.; Rana, P.; Odonkor, C. A.; Chow, R.

2026-04-23 pain medicine 10.64898/2026.04.21.26351437 medRxiv
Top 0.1%
22.9%
Show abstract

Introduction Large language models are increasingly being used in healthcare. In interventional pain medicine, clinical reasoning is essential for procedural planning. Prior studies show that simplified prompts reduce clinical detail in AI-generated responses. It remains unclear whether this reflects knowledge loss or simply prompt-driven suppression of information. Methods We performed a controlled comparative study using 15 standardized low back pain questions representing common interventional pain questions. Each question was submitted to ChatGPT under three conditions, professional-level prompt (DP), fourth-grade reading-level prompt (D4), and clinician-directed rewriting of the D4 response to a medical level (U4[->]MD). No follow-up prompting was allowed. Three physicians independently rated responses for accuracy using a 0-2 ordinal scale. Clinical completeness was determined by consensus. Word count and Flesch-Kincaid Grade Level (FKGL) were also measured. Paired t-tests compared conditions. Results Accuracy was highest with professional prompting (1.76). Accuracy declined with the fourth-grade prompt (1.33; p = 0.00086). When simplified responses were rewritten for clinicians, accuracy returned to baseline (1.76; p {approx} 1.00 vs DP). Clinical completeness followed the same pattern showing DP 80.0%, D4 6.7%, U4[->]MD 73.3%. Fourth-grade responses were shorter and less complex. Upscaled responses were more complex and similar in length to professional responses. Inter-rater reliability was low (Fleiss {kappa} = 0.17), but trends were consistent across conditions. Conclusions Reduced clinical detail under simplified prompts appears to reflect constrained output rather than loss of knowledge. Clinician-directed reframing restores omitted content. LLM performance in interventional pain depends strongly on prompt design and intended audience.

7
Comparison of foundation models and transfer learning strategies for diabetic retinopathy classification

Li, L. Y.; Lebiecka-Johansen, B.; Byberg, S.; Thambawita, V.; Hulman, A.

2026-04-20 health informatics 10.64898/2026.04.17.26351092 medRxiv
Top 0.1%
22.5%
Show abstract

Diabetic retinopathy (DR) is a leading cause of vision impairment, requiring accurate and scalable diagnostic tools. Foundation models are increasingly applied to clinical imaging, but concerns remain about their calibration. We evaluated DINOv3, RETFound, and VisionFM for DR classification using different transfer learning strategies in BRSET (n = 16,266) and mBRSET (n = 5,164). Models achieved high discrimination in binary classification (normal vs retinopathy) in BRSET (AUROC 0.90-0.98), with DINOv3 achieving the best under full fine-tuning (AUROC 0.98 [95% CI: 0.97-0.99]). External validation on mBRSET showed decreased performance for all models regardless of the fine-tuning strategy (AUROC 0.70-0.85), though fine-tuning improved performance. Foundation models achieved strong discrimination but poor calibration, generally overestimating DR risk. While the generalist model, DINOv3, benefited from deeper fine-tuning, miscalibration remained evident. These findings underscore the need to improve calibration and the comprehensive evaluation of foundation models, which are essential in clinical settings. Author summaryArtificial intelligence is increasingly being used to detect eye diseases such as diabetic retinopathy from retinal images. Recent advances have introduced "foundation models," which are trained on large datasets and can be adapted to new tasks. We aimed to evaluate how well these models perform in a clinical prediction context, with a focus not only on accuracy but also on how reliably they estimate disease risk. In this study, we compared different types of foundation models using two independent datasets from Brazil. We found that while these models were generally good at distinguishing between healthy and diseased eyes, their predicted risks were often poorly calibrated. In other words, the estimated probabilities did not consistently reflect the true likelihood of disease. We also examined whether adapting the models to the target population could improve performance. Although this approach led to improvements, calibration issues remained. However, post-training correction improved the agreement between predicted risks and observed outcomes. Our findings highlight an important gap between model performance and clinical usefulness. We suggest that improving the reliability of risk estimates is essential before such systems can be safely used in healthcare.

8
Digital Health and Data Utilisation for Improved Primary Health Services Delivery: Multi-Site Perspectives from Quality Improvement Teams in Council Hospitals in Tanzania

Matimo, C. R.; Kacholi, G.; Mollel, H. A.

2026-04-17 health systems and quality improvement 10.64898/2026.04.10.26350674 medRxiv
Top 0.1%
22.0%
Show abstract

BackgroundDigital health plays an indispensable role in facilitating data analysis and use for enhancing healthcare delivery across health settings. However, there is scant information on the extent to which digital health influences the improvement of primary health services delivery through data use. This study examined the determinants that influence the use of digital health to improve health service delivery in council hospitals in Tanzania. MethodsA cross-sectional design was employed in six regions, involving 12 council hospitals. We used a self-administered questionnaire to collect data from 203 members of hospital quality improvement teams. Descriptive analysis was used to determine the frequency, proportion, and mean of responses, while bootstrapping analysis was conducted to test the statistically significant influence of digital health factors on data use for improving health service delivery. ResultsResults show moderate agreement on data compatibility for planning and decision-making, with 40.4% of respondents agreeing it supports ordering commodities, 43.8% for staff allocation, and 38.4% for planning. However, dissatisfaction was higher for user-friendliness (47.8%), reliability (up to 65.5%), and usefulness (up to 63.5%). Overall, 50.2% (M=2.74{+/-}0.87) disagreed that digital systems effectively support data use. Structural model analysis confirmed significant positive influence of usefulness ({beta}=0.199, p<0.001) and access to quality data ({beta}=0.729, p<0.001) on data use, which strongly impacted service delivery ({beta}=0.593, p<0.001), despite some factors showing no direct influence. ConclusionThe study finds that current digital health initiatives only modestly improve the user-friendliness, reliability, and usefulness of data systems, partly due to fragmented, non-interoperable platforms that burden data management. However, compatibility, usability, reliability, and usefulness of digital tools significantly enhance access to quality data and data-driven decisions. The study recommends strengthening and integrating existing systems and providing continuous digital health training to institutionalize data-informed decision-making.

9
Imbalance-Aware Optimal Transport Learning for Cost-Effective Diabetic Retinopathy Screening

SHI, M.; Afolabi, S. O.

2026-04-18 ophthalmology 10.64898/2026.04.16.26351035 medRxiv
Top 0.1%
19.3%
Show abstract

Abstract Background Diabetic Retinopathy (DR) is one of the leading cause of vision loss and blindness. AI models have been instrumental in providing an alternative solution to real-life medical treatment which are costly and sometimes not readily available in developing and underdeveloped nations. However, most of the existing AI models are developed with high-quality clinical images that makes it difficult to use such models in low-resource settings. For this reason, this research focus on bridging this gap by developing a low-resource, mobile-friendly, and deployable deep learning (DL) model for the detection of DR using an imbalance-aware optimal transport (OT) learning approach. Methods We trained our proposed framework with both high-quality hospital- grade images and low-resource smartphone-acquired images, and evaluated with the original test set from the smartphone domain. We also curated three levels of smart- phone image-degradation quality and reported results from multiple experiments with bootstrapping. All model evaluations were assessed using the AUC, Sensitivity, and Specificity. Our results were compared with empirical risk minimization (ERM), Prototype OT, and Sinkhorn OT methods. Results We used four strong backbone architectures in the assessment. With our framework, Mobilevit-s achieved the best performance: an AUC of 87%, sensitivity of 89%, and specificity of 95%. Meanwhile, the statistical significance performance test (95% CI) shows that the AUC results are in the range of approximately 84% to 89%. For sensitivity, the range is 81% to 96%, and for specificity, 93% to 96%. This result indicated a performance increase of more than 3-5% compared to baseline methods. Conclusion Our framework shows promising results for low-resource DR screening, which has a potential to benefit less-advantaged groups and developing nations. Keywords Diabetic retinopathy, cost-effective AI, optimal transport, smartphone screening, deep learning.

10
Stakeholder perspectives on the use of enhanced mobile phone capabilities for public health surveillance for non-communicable disease risk factors: A qualitative study

Mwaka, E. S.; Nabukenya, S.; Kasiita, V.; Bagenda, G.; Rutebemberwa, E.; Ali, J.; Gibson, D.

2026-04-23 health informatics 10.64898/2026.04.22.26351443 medRxiv
Top 0.1%
17.3%
Show abstract

Background: Mobile phone-based tools are increasingly used to collect data on non-communicable disease (NCD) risk factors, particularly in low-resource settings where traditional data collection systems face operational and infrastructural constraints. This study examined stakeholder perspectives on the use of enhanced mobile phone-based capabilities to support the collection of public health surveillance data on NCD risk factors in low-resource settings. Methods: An exploratory qualitative study was conducted between November 2022 and July 2023. Twenty in-depth interviews were conducted with public health specialists, ethicists, NCD researchers, health informaticians, and policy makers in Uganda. Thematic analysis was used to interpret the results. Results: Four themes emerged from the data, including benefits of using mobile phone capabilities for NCD risk factor data collection; ethical, legal, and social implications; perceived challenges of using such mobile phone capabilities; and proposed solutions to improve the utility of phone-based capabilities in data collection on NCD risk factors. Participants recognized the potential of mobile technologies to improve data collection efficiency and expand access to hard-to-reach populations. However, concerns emerged regarding inadequate informed consent, risks to privacy and confidentiality, unclear data ownership, and vulnerabilities created by inconsistent enforcement of data protection laws. Social concerns included low digital literacy, unequal access to mobile devices, and fear of stigmatization. Participants emphasized the need for transparent communication, robust data governance, and community engagement. Conclusion: Mobile phone-based systems can strengthen the collection of NCD risk factor data in low-resource settings; however, their benefits depend on addressing key ethical, legal, and social challenges. To ensure responsible deployment, digital health initiatives must prioritize participant autonomy, data protection, equity, and trust building. Integrating contextualized ethical, legal, and social considerations into design and policy frameworks will be essential to leveraging mobile technologies in ways that support inclusive and effective NCD prevention and control.

11
Tuberculosis in households with infectious cases in Kampala city: Harnessing health data science for new insights on an ancient disease with persistent, unresolved problems (DS-IAFRICA TB) study protocol

Nassinghe, E.; Musinguzi, D.; Takuwa, M.; Kamulegeya, R.; Nabatanzi, R.; Namiiro, S.; Mwikirize, C.; Katumba, A.; Kivunike, F. N.; Ssengooba, W.; Nakatumba-Nabende, J.; Kateete, D. P.

2026-04-25 infectious diseases 10.64898/2026.04.23.26351571 medRxiv
Top 0.1%
17.1%
Show abstract

Tuberculosis (TB) is prevalent in Uganda and overlaps with a high rate of HIV/TB coinfection. While nearly all hospital-based TB cases in Kampala, the capital of Uganda, show clear TB symptoms, 30% or more of undiagnosed TB cases found through active screening are asymptomatic. Additionally, the host risk factors for TB in Kampala cannot be distinguished from environmental risk factors. These TB-specific challenges are just part of the complexity, especially in areas with high HIV/AIDS burden. Data science techniques, especially Artificial Intelligence (AI) and Machine Learning (ML) algorithms, could help untangle this complexity by identifying factors related to the host, pathogen, and environment, which are difficult to explain or predict with traditional/conventional methods. In this project, we will use health data science approaches (AI/ML) to identify factors driving TB transmission within households and reasons for anti-TB treatment failure. We will utilize the computational resources at Makerere University and available demographic, clinical, and laboratory data from TB patients and their contacts to develop AI and ML algorithms. These will aim to: (1) identify patients at baseline (month 0) unlikely to convert their sputum or culture results by months 2 and 5, thus at risk of failing TB treatment; (2) identify household contacts of TB cases who are at risk of developing TB disease, as well as contacts who may resist TB infection despite repeated exposure to M. tuberculosis. Achieving these objectives will provide evidence that data science methods are effective for early detection of potential TB cases and high-risk patients, thereby helping to reduce TB transmission in the community. The study protocol received approval from the School of Biomedical Sciences IRB, protocol number SBS-2023-495.

12
Mapping Data Sources for Local Decision-Making on Maternal and Child Health in Tribal Primary Health Centre Settings of Andhra Pradesh, India

Mitra, A.; Jayaraman, G.; Ondopu, B.; Malisetty, S. K.; Niranjan, R.; Shaik, S.; Soman, B.; Gaitonde, R.; Bhatnagar, T.; Niehaus, E.; K.S, S.; Roy, A.

2026-03-30 public and global health 10.64898/2026.03.28.26349587 medRxiv
Top 0.1%
15.2%
Show abstract

Background: Health systems in low- and middle-income countries are frequently described as "data rich, information poor", collecting substantial amounts of data that rarely inform local decision-making. In tribal settings, this challenge is compounded by geographic isolation, fragmented governance, sectoral silos, and the absence of disaggregated tribal health data within routine health information systems. We conducted a systematic mapping of data sources available for maternal and child health (MCH) decision-making at tribal Primary Health Centres (PHCs) in Andhra Pradesh, India. Methods: Using a participatory data discovery approach embedded within an action research project, we mapped data sources across three PHCs under the Integrated Tribal Development Agency (ITDA) - Rampachodavaram, Alluri Sitarama Raju District of Andhra Pradesh, India. Data discovery proceeded through three phases: document review, key informant interviews with Medical Officers and frontline health workers, and stakeholder validation. Sources were classified using the HEALTHY framework (Healthcare, Education, Access, Labour, Transportation, Housing, Income) and the Keller's data discovery typology (Designed, Administrative, Opportunity, Procedural). Accessibility was assessed based on whether Medical Officers could retrieve data for local planning and decision-making. Results: We identified 28 distinct data sources relevant to MCH decision-making. Healthcare dominated (57.1%), while determinant domains remained underrepresented: Housing (10.7%), Income (10.7%), Education (7.1%), Labour (7.1%), Transportation (3.6%), and Access to healthy choices (3.6%). By data origin, Administrative sources predominated (46.4%), followed by Opportunity (21.4%), Procedural (17.9%), and Designed (14.3%). Despite 67.9% of sources having digital components, only 32.1% were fully accessible to Medical Officers, with 10.7% partially accessible and 57.1% inaccessible at the PHC level. Accessibility barriers were consistent across data categories, ranging from 50.0% to 66.7% inaccessibility. Conclusions: The tribal PHC data ecosystem exhibits a fundamental mismatch between data generation and local utility. Data is predominantly collected for administrative reporting rather than local decision-making. Addressing MCH outcomes in tribal populations requires reorienting health information systems toward local needs.

13
A Deployable Explainable Deep Learning System for Tuberculosis Detection from Chest X-Rays in Resource-Constrained High-Burden Settings

Agumba, J.; Erick, S.; Pembere, A.; Nyongesa, J.

2026-04-01 radiology and imaging 10.64898/2026.03.31.26349662 medRxiv
Top 0.1%
15.0%
Show abstract

Abstract Objectives: To develop and evaluate a deployable deep learning system with Gradient-weighted Class Activation Mapping (Grad-CAM) for tuberculosis screening from chest radiographs and to assess its classification performance and explainability across desktop and mobile deployment platforms. Materials and methods: This study used publicly available chest X-ray datasets containing Normal and Tuberculosis images. A DenseNet121-based transfer learning model was trained using stratified training, validation, and test splits with data augmentation and class weighting. Model performance was evaluated using accuracy, precision, recall, F1 score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUC). Grad-CAM was used to visualize regions influencing model predictions. The trained model was converted to TensorFlow Lite and deployed in both a Windows desktop application and a Flutter-based mobile application for offline inference and visualization. Results: The model demonstrated strong classification performance on the independent test dataset, with high accuracy and AUC values indicating effective discrimination between Normal and Tuberculosis cases. Grad-CAM visualizations showed that the model focused primarily on anatomically relevant lung regions, particularly the upper and mid-lung fields in Tuberculosis cases. Deployment testing confirmed consistent prediction outputs and Grad-CAM visualizations across both Windows and mobile platforms. Conclusion: The proposed deployable deep learning system with Grad-CAM provides accurate and interpretable tuberculosis screening from chest radiographs and demonstrates feasibility for offline mobile and desktop deployment. This approach has potential as an artificial intelligence-assisted screening and decision support tool in radiology, particularly in resource-limited and remote healthcare settings.

14
Data use practices and challenges for maternal and child health decision-making in tribal primary health centres in Andhra Pradesh, India

Mitra, A.; Jayaraman, G.; Ondopu, B.; Malisetty, S. K.; Niranjan, R.; Shaik, S.; Soman, B.; Gaitonde, R.; Bhatnagar, T.; Niehaus, E.; K.S, S.; Roy, A.

2026-03-31 public and global health 10.64898/2026.03.29.26349634 medRxiv
Top 0.1%
14.6%
Show abstract

Background: Primary health centres in tribal areas of India collect large volumes of maternal and child health (MCH) data through routine health information systems, yet this data rarely informs local clinical or programmatic decision-making. The gap between data collection and data use in tribal settings, where health disparities are most acute, remains poorly documented from the perspective of frontline decision-makers. Methods: We conducted a qualitative study embedded in the diagnostic phase of an Action Research project in three tribal primary health centres under the Integrated Tribal Development Agency (ITDA), Rampachodavaram, Alluri Sitharama Raju District, Andhra Pradesh. Eight key informant interviews were conducted with medical officers (n=5), a district programme officer (n=1), and data entry operators (n=2). Participant observation at weekly convergence meetings and document review of registers and reports supplemented interview data. Transcripts were independently coded by two analysts using Braun and Clarke's reflexive thematic analysis. Findings: Three interconnected domains emerged. First, local MCH decision-makers needed individual-level, geographically disaggregated, prospective information to plan outreach and follow-up, but formal systems provided only retrospective aggregate statistics. Second, three structural constraints prevented formal systems from meeting these needs: digital infrastructure designed for connected settings, upward data flows with no local feedback, and a single-point- of-access governance vulnerability where one data entry operator's mobile phone controlled portal authentication for all facilities in the jurisdiction. Third, decision-makers constructed four complementary information practices (WhatsApp networks, self-built tracking tools, cross-sectoral convergence meetings, and reliance on intermediary-consolidated reports) to bridge the gap. Interpretation: Complementary information practices are expressions of local ingenuity under structural constraint, not system failures. MCH digital health reform should map and strengthen these practices rather than bypass them. Authentication governance in low- connectivity tribal settings requires urgent policy attention

15
Governance, Accountability and Post-Deployment Monitoring Preferences for AI Integration in West African Clinical Practice: A Mixed-Methods Study

Uzochukwu, B. S. C.; Cherima, Y. J.; Enebeli, U. U.; Okeke, C. C.; Uzochukwu, A. C.; Omoha, A.; Hassan, B.; Eronu, E. M.; Yusuf, S. M.; Uzochukwu, K. A.; Kalu, E. I.

2026-04-01 health informatics 10.64898/2026.03.30.26349782 medRxiv
Top 0.1%
14.1%
Show abstract

Background: The integration of artificial intelligence (AI) into clinical practice holds transformative potential for healthcare in West Africa, but safe deployment requires context-appropriate governance, accountability, and post-deployment monitoring frameworks. This cross-sectional mixed-methods study examined preferences and concerns of West African clinicians and technical experts regarding AI governance structures, post-deployment surveillance mechanisms, and accountability allocation. Methods: A structured questionnaire was administered to 136 physicians affiliated with the West African College of Physicians (February 22-28, 2026), complemented by 72 key informant interviews with technical leads, AI developers, data scientists, policymakers, and healthcare leaders. Data were analyzed using descriptive statistics, inferential tests, and thematic analysis. Results: Clinicians strongly preferred independent regulatory bodies (40.4%) for overseeing AI tool performance, with high trust ratings (mean:4.3/5), while vendor self-monitoring received minimal support (3.7%, mean:2.4/5). Real-time dashboards were the most favored monitoring approach (41.9%). Clear accountability pathways (94.1%), algorithm transparency (91.9%), and real-time performance data (89.7%) were rated essential by majorities. Major concerns included clinicians being unfairly blamed for AI errors (76.5%), excessive vendor control (72.8%), and absence of clear reporting pathways (69.9%). Qualitative findings emphasized continuous performance tracking for accuracy, fairness, and bias; structured incident reporting; protocols for model drift and failure; and multi-layered governance combining independent oversight, institutional AI committees, and explicit liability frameworks. Conclusion: This study provides the first empirical evidence from West Africa on clinician preferences for AI governance. Findings offer actionable guidance for policymakers to build trustworthy, equitable, and safe AI integration frameworks that prioritize transparency, independent oversight, and clinician protection. Keywords: artificial intelligence; AI governance; post-deployment monitoring; accountability; West Africa; clinician preferences; health data science.

16
Spine Reviews: Crowdsourcing Global Spine Expert Knowledge via Digital Ledger Technology

Challier, V.; Diebo, B.; Lafage, V.; Dehouche, N.; Lonjon, G.; Cristini, J.; SpineDAO,

2026-04-13 health informatics 10.64898/2026.04.11.26350678 medRxiv
Top 0.1%
12.8%
Show abstract

Study Design: Prospective observational study using a novel digital ledger technology (DLT)-based crowdsourcing platform. Objective: To develop and evaluate Spine Reviews, a blockchain-based platform for aggregating spine treatment recommendations from an international specialist panel, and to validate the clinical coherence of the resulting dataset. Summary of Background Data: Predictive models for low back pain treatment are limited by small, homogeneous datasets that fail to capture inter-clinician variability. Traditional multi-center data collection is expensive, slow, and geographically constrained. DLT-based crowdsourcing with cryptographic credentialing may overcome these barriers. Methods: Five hundred synthetic patient vignettes (digital twins) were generated; 463 retained after quality control. A review platform was built on the Solana blockchain using non-transferable Soulbound Tokens (SBTs) for credentialing and smart-contract compensation. Fifty-two specialists from 7 countries provided 4+ reviews per vignette across four treatment tiers, without access to imaging or physical examination. Mixed-effects regression with reviewer random intercepts partitioned decision variability. Results: The platform collected 2,066 completed reviews (97.7%) over 37 days at USD 0.97/review. Variance decomposition revealed that 36.7% of treatment tier variability was attributable to patient presentation, 19.2% to reviewer practice style, and 44.1% to their interaction. Neurological deficits (beta=0.39), symptom duration (beta=0.12), and pain (beta=0.09) independently predicted treatment escalation (all p<0.001). Gwet's AC1 was almost perfect for emergency (0.92) and substantial for conservative decisions (0.67). Reviewer confidence in treatment recommendations decreased with escalating tier severity (conservative 4.59/5 vs surgical 4.05/5), suggesting appropriate uncertainty calibration. Conclusions: DLT with SBT credentialing enables rapid, global, cost-effective aggregation of clinically coherent expert judgment. The three-component variance structure quantifies clinical equipoise in spine care and establishes that predictive models require diverse, multi-reviewer training data. Keywords: digital ledger technology; blockchain; crowdsourcing; clinical decision-making; low back pain; Soulbound Tokens

17
Development and validation of a machine learning model for community-based tuberculosis screening among persons aged >= 15 years in South Africa and Zambia

Zimmer, A. J.; Loharja, H.; Fentahun Muchie, K.; Koeppel, L.; Ayles, H.; Castro, M. d. M.; Christodoulou, E.; Fox, G. J.; Gaeddert, M.; Hamada, Y.; Isaacs, C.; Kapata, N.; Chanda-Kapata, P.; Karimi, K.; Kasese, N.; Kerkhoff, A.; Law, I.; Maier-Hein, L.; Marx, F. M.; Maimbolwa, M. M.; Moyo, S.; Mthiyane, T.; Muyoyeta, M.; Rocklöv, J.; Schaap, A.; Yerlikaya, S.; Opata, M.; Denkinger, C. M.

2026-04-04 public and global health 10.64898/2026.03.30.26349632 medRxiv
Top 0.1%
12.5%
Show abstract

Introduction: Current tuberculosis (TB) screening tools, such as the WHO four-symptom screen (W4SS), lack sufficient sensitivity and specificity for effective community-based active case finding, contributing to both missed diagnoses and unnecessary diagnostic evaluations. This study aimed to develop and validate a machine learning (ML) model to improve TB risk prediction among persons aged >=15 years in community settings of Zambia and South Africa. Methods: A large, harmonized dataset was created from four community-based TB prevalence surveys in South Africa and Zambia (N=169,813), restricted to individuals not under treatment at the time of survey. A binary reference outcome was defined based on available microbiological and radiographic data, grouping individuals as either 'Possible TB' or 'Unlikely TB'. An XGBoost model was trained on 80% (N=135,854) of the data using demographic, clinical, and socio-economic variables, and model interpretability was assessed using SHapley Additive exPlanations (SHAP) values. Internal validation was performed using a 20% hold-out test set (N=33,959). Model performance was assessed using discrimination, calibration, and clinical utility measures compared to the W4SS and against WHO's 2025 Target Product Profile (TPP) for a tool in a two-step screening algorithm. Results: Overall, 16,413 (9.7%) of individuals were labelled as 'Possible TB'. On the test set, the XGBoost model yielded an area under the curve (AUC) of 79.7% (95% CI: 78.7, 80.7), outperforming the W4SS (AUC 57.0%; 95% CI: 56.1, 57.8). The XGBoost model achieved 81.5% sensitivity (95% CI: 77.6, 84.9) at a 60% specificity threshold. This exceeded the W4SS, which achieved only 38.2% sensitivity (95% CI: 36.5, 39.9) on the same dataset. SHAP analysis identified age, previous TB treatment, times treated for TB and unemployment as the primary contributors to risk. Conclusion: The ML XGBoost model shows promise as a screening tool to support community-based active case finding activities prior to diagnostic testing. However, as performance remained below TPP targets, and adding variables, e.g. on geolocation, could be considered.

18
Prescribed Cardiac Wearables in Routine Care: a qualitative study of Patient Experiences

Zeng, A.; O'Hagan, E. T.; Trivedi, R.; Ford, B.; Perry, T.; Turnbull, S.; Sheahen, B.; Mulley, J.; Sedhom, M.; Choy, C.; Biasi, A.; Walters, S.; Miranda, J. J.; Chow, C. K.; Laranjo, L.

2026-04-11 health systems and quality improvement 10.64898/2026.04.09.26350550 medRxiv
Top 0.1%
12.5%
Show abstract

Background: Continuous adhesive patch electrocardiographic (ECG) wearables are increasingly prescribed. Patient experience with these devices can influence adherence, but research in this area is limited. This study aimed to explore the perceptions and experiences of patients receiving wearable cardiac monitoring technology as part of their routine care through the lens of treatment burden. Methods: This was a qualitative study with semi-structured phone interviews conducted between February and May 2024. We recruited participants from primary care and outpatient clinics using maximum variation sampling to ensure diversity in sex, ethnicity, and education levels. Interviews were audio-recorded, transcribed, and analysed using reflexive thematic analysis. Results: Sixteen participants (mean age 51 years, 63% female) were interviewed (average duration: 33 minutes). Three themes were developed: 1) ?Experience using the device: Burden vs Ease of Use?, which captured participants? perceptions of how easily they could integrate the device in their daily lives; 2) ?Individual variability in responses to ECG self-monitoring? covered participants? emotional and cognitive response to knowing their heart rhythm was monitored; and 3) ?The care process shapes patient experiences? reflected support preferences during the set-up and monitoring period and the uncertainty regarding timely clinical and device feedback. Conclusions: Patients valued cardiac wearables for facilitating diagnosis and felt reassured knowing they were clinically monitored. However, gaps in information provided to patients seemed to cause anxiety for some participants. These concerns could be mitigated through clearer clinician communication and patient education at the time of prescription.

19
"Mapping Stakeholder Engagement in Endometriosis Care Innovation: Insights from the VendoR Project"

Mahdikhani, S.; Cleary, F.; Cummins, S.

2026-04-07 health systems and quality improvement 10.64898/2026.04.01.26349826 medRxiv
Top 0.1%
12.3%
Show abstract

Objectives: Endometriosis affects approximately 10% of reproductive age women worldwide, yet care pathways remain fragmented and treatments have limitations. This study aimed to identify and categorize key stakeholders in endometriosis care in Ireland, assess their influence and interest in the digital health initiative, and identify drivers and barriers affecting uptake of innovative approaches to care. Methods: A virtual stakeholder mapping workshop was conducted with participants from healthcare, policy, education, technology, academia, and patient communities. Using a structured MS Teams Whiteboard, participants generated a stakeholder list, positioned stakeholders on an Influence-Interest Matrix, and provided qualitative insights on factors enabling or constraining engagement with digital health innovation. Results: Stakeholders were distributed across all four quadrants of the matrix. High-interest/high-influence stakeholders included the HSE, specialist centres, general practitioners, and the Endometriosis Association of Ireland. High-interest/low-influence groups comprised patients, families, and online communities, while policymakers, hospital managers, and the education sector were identified as high-influence but low-interest actors. Key drivers included strong patient advocacy, institutional support such as engagement from the HSE, and growing awareness of digital health tools. Major barriers encompassed prolonged diagnostic delays, resource constraints, gaps in clinical knowledge, technology anxiety, and challenges sustaining engagement. Conclusions: Stakeholder mapping provided an evidence-informed foundation for the VendoR project, revealing engagement gaps and leverage points critical for improving endometriosis care innovation. The findings highlight the need for intentional, well-resourced strategies that elevate patient voices, address systemic barriers, and ensure balanced representation, supporting the co-design, co-creation, and co-production of digital health interventions for sustainable, patient-centred care.

20
Pneumonia Detection in Paediatric Chest X-Rays using Ensembled Large Language Models

Tan, J.; Tang, P. H.

2026-04-12 radiology and imaging 10.64898/2026.04.10.26347909 medRxiv
Top 0.2%
10.3%
Show abstract

Background: Paediatric pneumonia is a leading cause of childhood morbidity and mortality worldwide. Chest X-rays (CXR) are an important diagnostic tool in the diagnosis of pneumonia, but shortages in specialist radiology services lead to clinically significant delays in CXR reporting. The ability to communicate findings both to clinicians and laypersons allows MLLMs to be deployed throughout clinical workflows, from image analysis to patient communication. However, MLLMs currently underperform state-of-the-art deep learning classifiers. Objective: To evaluate the diagnostic accuracy of ensemble strategies with MLLMs compared to the baseline average agent for paediatric radiological pneumonia detection. Methods: We conducted a retrospective cohort study using paediatric CXRs from two independent hospital datasets totalling 2300 CXRs. Fifteen MedGemma-4B-it agents independently classified each CXR into five pneumonia likelihood categories. Majority voting, soft voting, and GPTOSS-20B aggregation were compared against the average agent performance. The primary metric evaluated was OvR AUROC. Secondary metrics included accuracy, sensitivity, specificity, F1-score, Cohen's kappa, and OvO AUROC. Results: Soft voting achieved improvements in OvR AUROC (p_balanced = 0.0002, p_real-world = 0.0003), accuracy (p_balanced = 0.0008, p_real-world < 0.0001), Cohen's Kappa (p_balanced = 0.0006, p_real-world = 0.0054) and OvO AUROC (p_balanced < 0.0001, p_real-world = 0.0011) across both datasets, and a superior F1-value (pbalanced = 0.0028) for the balanced dataset. Conclusion: Soft voting enhances MedGemma's diagnostic discriminatory performance for paediatric radiological pneumonia detection. Our system enables privacy-preserving, near real-time clinical decision support with explainable outputs, having potential for integration into emergency departments. Our system's high specificity supports triage by flagging high-risk radiological pneumonia cases.